---
title: "Works Extraction"
date: "June 20, 2025"
format: html
---

# Libraries

```{r}
library(tidyverse)
library(bibliometrix)
library(readxl)
library(stringr)
library(openxlsx)
library(RefManageR)
library(ggplot2)
library(dplyr)
library(tidytext)
library(tidyr)
library(igraph)
library(ggraph)
library(widyr)
library(stopwords)
```

## Search of informations

Defining the research question

Search terms:

[In portuguese:]{.underline} (“ocean*” OR “década oceânica” OR “atântico sul”) OR (“direito do mar” OR “CNUDM” OR “ZOPACAS”) AND (“defesa” OR “cooperaç*” OR “segurança”  OR “laços” OR “poder” OR “relaç*” OR “geopolítica” OR “estratég*” OR “política”) OR (“economia azul” OR “justiça azul”) AND (“Brasil” OR “amazôn*” OR “marinha do Brasil”).

[In english:]{.underline} (“ocean*” OR “ocean decade” OR “atlantic south”) OR (“law of the sea” OR “UNCLOS” OR “ZPCSA”) AND (“defense” OR “cooperation” OR “security” OR “bond*” OR “power” OR “relation*” OR “geopolitic*” OR “strateg*” OR “policy”) OR (“blue economy” OR “blue justice”) AND (“Brazil” OR “amazon*” OR “Brazilian marine”).


Search conducted on June 13, 2025

Search databases:

\[In Portuguese:\] Scopus, Google Scholar (Portuguese), Semantic Scholar, and Scielo.

Scopus: n = 23 (search date: 06/13/2025) Google Scholar: n = 100 (search date: 06/13/2025) Semantic Scholar: n = 23 (search date: 06/13/2025) Scielo: n = 0 (search date: 06/13/2025)

N (total in Portuguese) = 146

\[In English:\] Scopus, Google Scholar (English), and Semantic Scholar.

Scopus: n = 49 (search date: 06/13/2025) Google Scholar: n = 100 (search date: 06/13/2025) Semantic Scholar: n = 667 (search date: 06/13/2025)

N (total in English) = 967

Total N (overall) = 1.113

Filter 0 (applied during the initial literature review): publication year ≥ 1988 and ≥ 5 citations.

Total N (after filter 0) = 573

First and second filters: removal of duplicates and missing data (in Zotero): n = 514

Third filter: only works categorized as journal articles, books, book chapters, and conference papers [grey literature] (in Zotero): n = 447

For both Portuguese and English, the Scopus, Google Scholar, and Semantic Scholar databases were accessed using the software Publish or Perish (PoP). A keyword search strategy was used.

The search using the Scielo database was conducted directly via the Portuguese website of the library (scielo.br), using the "all indexes" option.

All search result documents were exported in both .csv and .bib format.

# Uploading the database from Zotero after filters 0, 1 and 2 

```{r}

# After Zotero treatment

base_zot <- read_csv("C:/Users/auadf/OneDrive/Área de Trabalho/Artigo -Milani/2a pesquisa - definitiva/Minha biblioteca-Zotero.csv")

```

# Filter 4: Only Humanities and Social Sciences

## Dictionary of Natural and Math Sciences

```{r}

cne_regex <- c(
  "biolog", "medic", "médic", "zoolog", "chem", "quím", "quimica", "físic", "physic",
  "engenhar", "neuro", "cardiolog", "oncolog", "genet", "ecolog", "botan",
  "geolog", "geofísic", "materiais", "material", "biotec", "anatom", "farmac",
  "matemát", "estat", "comput", "informát", "robót", "robot", "veterin",
  "clínic", "cirurg", "patolog", "toxicolog", "ambient", "clim",
  "terra", "planet", "astronom", "astrofísic", "arxiv",
  "oceanograf", "estuár", "costeir", "litoral", "psiquiatria", "algae",
  "pesca", "aquicult", "aquátic", "plâncton", "coral", "hydrokinetic", "recif", "hidrolog",
  "hidrograf", "náutic","ciência marinh", "ciências do mar", "marés", "ondas", "correntes", "salin", "batimetr", "delta", "aerosol*", "fish*", "cancer*", "rock", "microb", "marine", "ornith", "ornit", "oil", "óleo", "oie", "sedimentary", "shrimp", "carbonate", "gymnotus", "epidemiology", "parasit", "wildlife", "materia", "sens", "herpet", "phyllo", "saúde", "health", "animal", "limno", "mycos", "micose", "crop", "vet", "toxic", "tóxic", "atom", "átom", "engineer", "cancer", "urology", "engenhar", "faun", "ozone", "orbiter", "mars", "plastic", "aerosol", "diclofenac", "epidemiol", "fish", "bioaccumulation", "fisheries", "phylogeography", "eutrophication", "shark", "sharks", "mangroves", "entom", "zoolo", "heliy", "diseas", "doenç", "doent", "herpet", "sens", "oceanograp", "oceanograf", "freshwater", "biota","fish", "immun", "imunol", "spetroscop", "espectro", "bentholog", "greek", "environmental sciences", "atmos", "naturalist", "maritime studies", "precambrian", "molec", "zoo", "nitrous", "CO2", "oxid", "icht", "poultry", "nauplius", "acta", "academia brasileira de ciênc", "biogeo", "image process", "cosm", "geosc", "geoc", "music", "biotr", "embo", "meteor", "geophy", "geof", "nature", "spect", "espectr", "copeia", "aquaculture", "scientific reports", "archael", "arqueol", "crustaceana", "ecumenic", "graph", "thermal", "technol", "tecnol", "feed", "available at", "asia", "austral", "ecozon")

base_zot_filtro_4 <- base_zot %>%
  filter(
    !str_detect(
      str_to_lower(`Publication Title`),
      str_c(cne_regex, collapse = "|")
    )
  )

nrow(base_zot_filtro_4)
```
# Filter 5: manual check

## Extract results to manual check

```{r}

write.xlsx(base_zot_filtro_4, "(4)RS_zot_dados_brutos.xlsx")
```

# Upload filtered data

Manual screening. This procedure was carried out by one of the researchers on June 15, 16, and 17, 2025; and validated by a second researcher. It was conducted article by article, looking at title, abstract, keywords, and authors’ field of expertise, to exclude works that were not aligned with the research, either because they did not fall within the field of the social sciences and humanities, or because the topic itself did not fall within the scope of the research.
The filtering was primarily based on the authors’ disciplinary affiliation. A work was excluded when the author—if single-authored—or all authors—if co-authored—had no affiliation with the social sciences and humanities [rule]. The field of Defense and Security Studies was categorized as part of the social sciences and humanities, whereas the fields of Geophysics and Geology were categorized as outside the social sciences and humanities. In a few exceptional cases, even when the authors did not belong to the social sciences and humanities, the work was included due to its thematic relevance and alignment with the research topic [exception].

## N = 55

## Additions made during the literature review

Addition from Rafaela’s survey (n = 9, of which 5 did not meet the criteria and 1 could not be verified for citation count; 3 works were included); total n = 58.

Addition from Murilo’s survey (n = 30, including 22 journal articles, 7 grey literature items, and 1 book chapter; 13 works were included — 12 journal articles and 1 conference paper); total n = 71.

Total n = 71

```{r}
# Filtered database (n = 71)
RE_zot_final <- read_excel("C:/Users/auadf/OneDrive/Área de Trabalho/Artigo -Milani/2a pesquisa - definitiva/RE_zot_final.xlsx")
```

# Bibliometric Analysis: Graphics

```{r}
# Publication by year
pub_por_ano <- RE_zot_final %>%
  count(`Publication Year`) %>%
  arrange(`Publication Year`)

ggplot(pub_por_ano, aes(x = as.integer(`Publication Year`), y = n)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(color = "steelblue", size = 2) +
  geom_text(aes(label = n), vjust = -0.5, size = 4) +  # add numbers
  labs(
    title = "Publications by Year",
    x = "Year of publication",
    y = "Number of publications"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Journals with the highest number of publications
top_journals <- RE_zot_final %>%
  count(`Publication Title`, sort = TRUE) %>%
  slice_max(n, n = 10) %>%
  mutate(`Publication Title` = reorder(`Publication Title`, n))

ggplot(top_journals, aes(x = n, y = `Publication Title`)) +
  geom_segment(aes(x = 0, xend = n, y = `Publication Title`, yend = `Publication Title`),
               color = "gray70", size = 1) +
  geom_point(color = "steelblue", size = 4) +
  geom_text(aes(x = n + 1, label = n), size = 3.5) +  # number
  labs(
    title = "Top Publishing Journals",
    x = "Number of publications",
    y = "Journal"
  ) +
  theme_minimal() +
  xlim(0, max(top_journals$n) * 1.3)

# Top 10 works with more citations
top_cited <- RE_zot_final %>%
  arrange(desc(Cites)) %>%
  slice_head(n = 10) %>%
  mutate(Author_Year = paste(Author, "(", `Publication Year`, ")", sep = " ")) %>%
  mutate(Author_Year = reorder(Author_Year, Cites))

  ## Lollipop graph
ggplot(top_cited, aes(x = Cites, y = Author_Year)) +
  geom_segment(aes(x = 0, xend = Cites, y = Author_Year, yend = Author_Year),
               color = "gray70", size = 1) +
  geom_point(color = "steelblue", size = 4) +
  geom_text(aes(x = Cites + 1, label = Cites), size = 3.5, hjust = -0.5) +
  labs(
    title = "Top 10 Most Cited Works",
    x = "Number of Citations",
    y = "Author (Year)"
  ) +
  theme_minimal() +
  xlim(0, max(top_cited$Cites) * 1.3)

# Mean of citations per year
cites_por_ano <- RE_zot_final %>%
  group_by(`Publication Year`) %>%
  summarise(mean_cites = mean(Cites, na.rm = TRUE)) %>%
  arrange(`Publication Year`)

# Relationship Between Year of Publication and Number of Citations
ggplot(cites_por_ano, aes(x = as.integer(`Publication Year`), y = mean_cites)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(color = "steelblue", size = 3) +
  geom_text(aes(label = round(mean_cites, 1)), vjust = -1, size = 3) +
  labs(
    title = "Average Citations by Year of Publication",
    x = "Year of Publication",
    y = "Average Number of Citations"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Number of citations br year
cites_por_ano <- RE_zot_final %>%
  group_by(`Publication Year`) %>%
  summarise(total_cites = sum(Cites, na.rm = TRUE)) %>%
  arrange(`Publication Year`)

# Graphic Citations and Year
ggplot(cites_por_ano, aes(x = as.integer(`Publication Year`), y = total_cites)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(color = "steelblue", size = 3) +
  geom_text(aes(label = total_cites), vjust = -1, size = 3) +
  labs(
    title = "Total Citations by Year of Publication",
    x = "Year of Publication",
    y = "Total Number of Citations"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Count the number of works by theoretical framework
framework_counts <- RE_zot_final %>%
  count(Framework, name = "count") %>%
  arrange(desc(count))

# Create lollipop chart
ggplot(framework_counts, aes(x = reorder(Framework, count), y = count)) +
  geom_segment(aes(xend = Framework, y = 0, yend = count), color = "gray70") +
  geom_point(size = 4, color = "steelblue") +
  geom_text(aes(label = count), hjust = -1, size = 3.5) +
  coord_flip() +
  labs(
    x = "Theoretical Framework",
    y = "Number of Works",
    title = "Distribution of Works by Theoretical Framework"
  ) +
  theme_minimal() +
  expand_limits(y = max(framework_counts$count) * 1.1)

# Count the number of publications per year for each framework
yearly_counts <- RE_zot_final %>%
  filter(!is.na(`Publication Year`), !is.na(Framework)) %>%
  count(Framework, `Publication Year`, name = "count")

# Line plot with one line per framework
ggplot(yearly_counts, aes(x = `Publication Year`, y = count, color = Framework)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  geom_text(aes(label = count), hjust = -1, size = 3.5) +
  scale_color_brewer(palette = "Accent") +
  labs(
    title = "Yearly Evolution of Publications by Framework",
    x = "Publication Year",
    y = "Number of Works",
    color = "Framework"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    plot.title = element_text(hjust = 0.5) +
    expand_limits(y = max(framework_counts$count) * 1.1)
  )

# Words network in Titles
stopwords_pt <- stopwords("pt")

stopwords_completas <- c(stop_words$word, stopwords_pt)

palavras_pares <- RE_zot_final %>%
  select(Title) %>%
  mutate(id = row_number()) %>%
  unnest_tokens(word, Title) %>%
  filter(!word %in% stopwords_completas,
         str_detect(word, "[a-z]")) %>%
  pairwise_count(word, id, sort = TRUE, upper = FALSE)

graph <- palavras_pares %>%
  filter(n >= 3) %>%  # adjust the cut
  graph_from_data_frame()

# Detect the clusters
graph_undirected <- as.undirected(graph, mode = "collapse")  # collapse duplicated connections

clusters <- cluster_louvain(graph_undirected)
V(graph_undirected)$group <- clusters$membership

set.seed(123)

ggraph(graph_undirected, layout = "fr") +
  geom_edge_link(alpha = 0.2) +
  geom_node_point(aes(size = degree(graph_undirected), color = as.factor(group)),
                  show.legend = FALSE) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3) +
  theme_void() +
  labs(title = "Words Network in Titles") +
  theme(plot.title = element_text(hjust = 0.5))
```
